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Abstract 

Undirected labeled graphs and graph rewriting are natural models of chemical compounds and 
chemical reactions. This provides a basis for exploring spaces of molecules and computing reaction 
networks implicitly defined by graph grammars. Molecule graphs are connected, meaning that rewriting 
steps in general are many-to-many graph transformations. Chemical grammars are typically subject 
to combinatorial explosion, however, making it often infeasible to compute the underlying network by 
direct breadth-first expansion. 

To alleviate this problem, we introduce here partial applications of rules as a basis for the efficient 
implementation of strategies that are not only well suited for exploration of chemistries defined by graph 
grammars, but that are also applicable in a general graph rewriting context as well. As showcases, 
we explore a complex chemistry based on the Diels- Alder reaction to explore specific subspaces of the 
molecular space. As a non-chemical application we use the framework of exploration strategies to model 
an abstract graph rewriting problem to construct high-level transformations that cannot be directly 
represented the Double-Pushout formalism starting from simple DPO transformation rules. 

1 Introduction 

The structural formulae of chemical compounds are graphs that represent the connectivity and mutual 
arrangements of the atoms. Atom types are given as vertex labels, while edges represent bond types. At 
this level of modelling, chemical reactions are naturally represented as graph transformations. Chemical 
reactions are explained and categorized in terms of reaction mechanisms that encapsulate the local changes 
of chemical bonds. In the formal framework of graph grammars, reaction mechanisms correspond to the 
productions (rules). Because of this conceptual alignment between chemistry and graph grammars, a variety 
of artificial chemistry models of different degree of chemical realism have been devised on this basis [B] . Of 
course, these purely combinatorial models of chemistry have their limitation. Deliberately disregarding 
the spatial embedding of molecules they cannot capture many aspects of stereochemistry and they are 
restricted to (over)simplified models of reactions energies and reaction kinetics. Graph grammar models are 
nevertheless of practical interest when the task is to explore large areas of chemical spaces and they provide 
a means of analyzing regularities in very large reaction networks. 
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Of course there exist several graph rewriting tools, see for example for an overview. The areas of 
application include model checking and verification, proof representation, and modeling control flow of 
programs among many others. A strategy language to control the application of graph rewriting rules 
has been presented in [H] for PORGY [11[T7]. In a chemical setting however, a strategy framework for 
exploring chemical spaces has very different needs. An application of a graph grammar rule might merge 
and split graphs, i.e., a strategy framework needs a chemically motivated component handling. Decisions on 
how to expand the space are usually heavily influenced by chemical properties or additional data sources. 
Furthermore, the goal of an analysis might also be motivated by a chemical question like the detection of 
chemical subspaces or flnding speciflc chemical transformation patterns. The big need for a much more 
systematic exploration of chemical spaces has been identified also in chemistry and is discussed for example 
in[7]. 

The outline of the paper is as follows. The formal framework including the Double Pushout Approach is 
introduced in Section [2] In our chemical setting partial rule applications will be used which are described 
in Section [3] In Section [4] general strategies will be introduced. In Section [5] we will briefly comment on 
the implementation of the strategy framework. We will apply it to a complex chemical setting, namely 
the Diels- Alder reaction, and furthermore show results for an abstract graph rewriting problem in order to 
illustrate the concept of weakly connected subspaces. Results for these two settings are given in Section [6j 
and we conclude with Section [T] 



2 Formal Framework 

2.1 Chemical Graph Rewriting with the Double Pushout Approach 

Molecules are always represented by connected graphs. Chemical reactions, however, more often than not, 
involve two or more interacting molecules as their "input" (educts) and there is no guarantee that the 
"output" (products) is connected. Thus we have to consider graph transformations that operate on not 
necessarily connected graphs. More precisely, we regard a graph G here as a multiset {51,52, • • ■ ,3#g} of 
its T^G connected components. All graphs are simple. Double and triple bonds are viewed as edge labels 
rather than multiple edges. 

Several abstract formalisms for graph transformation have been explored in the literature, see e.g., |18j 
for a detailed introduction. We found that the so-called Double Pushout (DPO) approach provides the 
most intuitive direct encoding of chemical reactions and the closest connection to the language of chemistry. 

A DPO transformation rule p = {L ^ K ^ R) consists of three graphs L, R and K known as the left, 
right and context graph, respectively, and two graph morphisms I and r that determine how the context is 
embedded in the left and the right graph. The rule p can be applied to a graph G if the left graph L can be 
found in G and some additional consistency conditions are satisfied. This is modeled by the requirement 
that there is a matching morphism m : L ^ G that describe how L is contained in G. Intuitively, the copy 
of L is replaced within G by i? in such a way that the context K is left intact, resulting in the transformed 
graph H. This operation, the derivation G H, is described in the framework of category theory by the 
requirement that the following commutative diagram exists: 



L *- 


— K 






G - 


— D 



H 



The derivation G ==4- H implicitly define the intermediary graph D and the result graph H as well as 
morphisms d : K ^ D and n : R ^ H that fix how the context and the right graph of the rule are embedded 
in the intermediary and the result graph, respectively. In terms of molecules (connected components) we 
can write {ffi, 32, • ■ • , 5#g} {hi,h2, . ■ . ,h#H}- 

In applications to modeling chemistry, several additional requirements must be satisfied. Conservation 
of mass and atom types dictates that the restrictions of r and I to the vertex sets (atoms) are bijective. 
Furthermore, m (and by extension d and n) are subgraph isomorphisms and hence injective. We note in 
passing that this guarantees the existence of a bijection a : V{G) — >■ V{H) known as the atom mapping. In 

the DPO formalism, furthermore, the existence of an inverse production = (L K ^ R), corresponding 
to the reverse chemical reaction, is guaranteed. Some more basic properties of chemical graph grammars 
can be found in [1] . Fig. fl] shows an example of a chemical derivation. 
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Figure 1: Example of a chemical derivation from Cyclohexadiene and Isoprene using a Diels- Alder transfor- 
mation. The edges changed by the transformation is shown in red and the vertices from K are shown in 
green. Note that edges shown in parallel are in the underlying graphs a single edge with a special label to 
encode a specific chemical bond. 
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(b) (c) 



Figure 2: A bipartite graph no tatio n, in which the production itself is drawn as a special type of intermediate 
vertex, is used in most c ases ; (a) {(71,(72} ^ 93- We only make an exception for 1 -to-l transformations 
(isomerization reactions); (b) (74 4> 55. Multiplicities are indicated by multiple arcs; (c) {ffej^e} ^ 57- 



2.2 Proper Derivations 

Consider a valid derivation {91,92} ==^ {^1,^2} and an arbitrary graph g'. Clearly, the derivation 
{51752,5'} ==^ {^1,^2,5'} is also valid because the images of m and n are contained in {51,52} and 
{/ii,/i2}, respectively. The graph g' is irrelevant for the transformation. We call a derivation G ===^ H 
proper if imgm H gi $ for all 9i G G. It is not hard to see that the inverse of a proper derivation is again 
proper. 

Throughout the following sections we will assume every derivation to be proper, unless otherwise stated. 



2.3 Derivation Graphs 

Chemical reaction networks can be represented as directed (multi)hypergraphs whose vertices are the 
molecules of the "chemical universe" under consideration and whose hyperedges represent chemical reactions 
[21) . Here, it is important to consider hyperedges as multisets to accommodate the stoichiometric coefficients, 
i.e., the multiplicities in which molecules enter a chemical reaction such as 2H2 + 02—^ 2H2O. Such 
networks can be constructed from experimentally observed data. An example is the Network of Organic 
Chemistry (NOC) [IKTHIIII], which shows a non-trivial organization concentrated around a core region of 
about 300 synthetically important building blocks and industrial compounds. Metabolic networks consist 
of the enzymatically catalyzed reactions constituting the chemical basis of modern life forms. They are 
available from dedicated databases, see e.g., [13) . 

In the framework of graph grammar models, an analogous derivation graph can be defined. Its vertex set 
consists of the connected labeled graphs & that represent the molecules. Directed hyperedges connect the 
multisets G C and H C & only if there is a proper derivation G H. The conventions for visualizing 
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Figure 3: Partial application of some rule p = {L ^ K ^ R) to a. graph G, with L — {li,l2,h} and 
R = {ri,r2}. The partial application is done through a partial matching morphism fn : L ~> G with 

L = {^1, 12}. The application results in a new rule, pG — {L' -v— K' — > Rg) with L' — {M, for which R is 



a subgraph of Rg- The transformed graph of G, called H, is also a subgraph of Rg- Fig. 
of subgraph relations for a general partial rule application. 



is the diagram 



hyperedges adhere to the three examples in Fig. [2j 

3 Transformation by Partial Rule Application 

The core strategy to expand the underlying derivation graph is the discovery of new graphs by means 
of proper derivations implied by the direct application of rules. Given a rule p = [L K ^ R) and a 
set of graphs il, the task is to find all proper derivations G ^ H,G Q where G and H are multisets 
of graphs. This can be done by a testing of all fc-multisubsets of il for all 1 < fc < Since nearly 

all chemical reactions are mono-molecular or bi-molecular, we can restrict ourselves to #L < 2, at least 
when elementary reactions are of primary interest. Still, the number of multisets is 0(|ilp). In the worst 
case, all unique multisets may give successful transformations, often leading to a combinatorial explosion 
that quickly becomes unmanagable. In the following section we show that a more detailed control of the 
multisets that are considered for transformation is desirable. 

The key concept is partial rule composition i.e., the binding of graphs to rules, resulting in partial 
rules that can be applied more efficiently in an exploration strategy. The idea is analogous to partial 
evaluation of functions by binding some of the variables. Full graph transformations are computed as 
repeated partial rule application in this framework. For the sake of brevity, we only sketch the idea here 
and omit a complete formal definition of partial rules. 

A partial rule application of a rule p = {L -f^ K ^ R) with L = {^i, Z2, ■ • • , to a graph G, is a 
generalization of a full transformation of G in which only some but not all components of L do not match 
G. Thus L is partitioned into the matching part i 7^ and the non-matching remainder L' . The restriction 

I of Z : A' i to the pre-image of L defines the partial transformation rule p ^ [L <^ K ^ R). Using 
the restricted matching morphism m : L — G it can be applied to G resulting in graph H . The remainder 

/' r' 

L' of L gives rise to a new rule pg = {L' ^ K' — > Rg) whose right graph consists of the transformed 
version of G as well the original right graph i?, i.e., it contains both H and R as subgraphs. A formal, 
diagrammatic representation is given in Fig. [3c] An abstract partial application is shown in Fig. |3a|and |3b| 

Given a not necessarily connected graph G and DPO transformation rule p = {L -(^ K R), our task 
is to construct all partial rules obtainable by binding G to p- These partial rules can then be applied to 
further graphs, allowing for more efficient exploration strategies. The following algorithm enumerates these 
partial rules: 

1. For all k G L find the set of all subgraph isomorphisms of li to G. That is, find Mi = {m | m : — >■ 
G is a subgraph isomorphism} for 1 < i < 

2. For all nonempty subsets L of L, construct all partial matching morphisms, m, by merging morphisms 
from each Mj,lj G L. Note, that each m must be injective. 

3. For each partial matching morphism, to, apply p to G with to to obtain a new rule pG ~ {L' <^ 
K' ^ Rg)- 

The partial matching morphisms constructed from considering L — L are actually full matching morphisms, 
and so the resulting rule has L' = K' = 0. In this case pg represents the creation of Rg from an empty 
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S{F) = U{F) 



U[F,) 



U{F2) 



Figure 4: Illustration of the evaluation of F2 = p{Fi) for Fi ~ p{F) and some set of graphs, S{F) = U{F) = 
it. Each derivation must use at least one graph form the input subset. Two abstract derivations are shown 
with the endpoints indicating in which sets the graphs are. 

graph, and G Rg is a valid derivation. If G is connected, the derivation will additionally be proper. 
In the following section we will regard a rule p as a function on sets of graphs, defined provisionally as: 

p(iI)=iiU IJ H 

GCU 

That is, the result of applying p to a set of graphs, it, is il itself along with all graphs derivable from il 
using p. 

3.1 Complex Graph States 

Consider the problem of applying a rule p twice to a set of graphs il. That is, finding il2 = pi^i) for 
ill — p{^) ■ By our definition of rule application we have il C iti , so when the algorithm described above is 
used for evaluating p(ili) it will find not only new derivations but also all derivations found when evaluating 
p(ii). We therefore use a more complex state than simply sets of graphs. A graph state F is defined as a 
pair of ordered sets of graphs {U,S) with S CU. The elements, U and S, will be referred to also as U(F) 
and S{F) respectively, where U and S are functions on the graph state. In the following we will denote 
U{F) as the universe of the graph state F and S{F) as the subset of the state. The order of graphs in the 
subset and in the universe is independent and is arbitrary unless otherwise stated. 

We define the application of a rule p to a graph state F in the following manner. Let H' be all connected 
graphs derivable from U{F) with p such that at least one graph from S{U) is being transformed in each 
derivation: 

H' ^{he H IG^- H -.G CU{F) AGnS{F) ^(d} (2) 
The result F' = p{F) is such that 

U{F') ^ U{F) U H' 

S{F') = H'\U{F) (3) 

That is, the resulting universe contains the input universe and all derived graphs, and the resulting subset 
contains all new graphs which was not known before. The removal of known graphs from the output subset 
is motivated by the goal of exploring the underlying network of derivations. However, this specific behaviour 



is not always desired so an alternate definition can be used (see Section 4.7). 

With the definition above we rewrite our initial example as; find F2 = p{Fi) for Fi — p{F) and 
S{F) = U{F) = il. The application p{Fi) can now only discover derivations with at least one graph from 
S{Fi), which by definition contains only new graphs. Therefore, only new derivations are found. Fig. |4] 
contains a visualization of the example. 

The implementation utilizes the algorithm for transformation by first partially applying the rule to the 
subset of the input state, and then afterwards the full universe. 



4 Strategies 

The previous section described how a rule p is applied to a state F to calculate a new state F' , and 
motivated this by the example of composition of rule application, F' — p{p{F)). Using the definition of a 
graph state, we generalize the interface for rule application into general strategies. A strategy is simply any 
function Q from and to the set of graph states. 

In the following we introduce core strategies defined in the framework. Most of the strategies are 
parameterized, which we will note with brackets around these parameters. The application of a strategy Q 
with some fixed parameter, n, to a graph state F is thus denoted as Q[n]{F). 
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Figure 5: Graphs and transformation rule for the example of the semantics of revive strategies. 

4.1 Parallel 

A parallel strategy is defined in terms of a set of substrategies, {Qi, (52, • • • , Qn}- The result of applying a 
parallel strategy is the union of the results from applying the individual substrategies: 

F' = parallel[{Qi, Q2, • • • , Qn}]{F) 
U{F')= U U{Q,{F)) 

l<i<n 

S{F')^ y S{Q,{F)) 

l<i<n 

The order of the resulting universe and subset is arbitrary. 

A simple example of the use of a parallel strategy is the application of multiple rules to the same set of 
graphs. 

4.2 Sequence 

A sequence strategy, Q, is a composition of a list of substrategies, Qi,Q2, ■ ■ ■ ,Qn' 

Q{F) = Q^{...{Q2{Q^{F))) 

To increase left-to-right readability of sequence strategies, we will use the notation Q = Q\ ^ Q2 ^ ■ ■ ■ ^ 
Qn- Additionally, ]i Qi = Q2 = ■ ■ ■ = Qn = Q', we may use the normal notation for powers of functions, 
Q = Q'", for the sequence. 

4.3 Repetition 

The sequencing strategy only allows composition of a fixed number of strategies, whereas the repetition 
strategy is used to compose a single strategy with itself many times. 

A repetition strategy, Q, is parameterized by a non- negative integer, n, and an inner strategy Q' . The 
inner strategy is composed with itself untill the graph state reaches a fixed point or its subset is empty, 
however at most n times: 

Q = repeat[(3', n] = Q'^ 

k = mini e {0, 1, . . . ,n}, such that Q'\F) = Q''+'^{F) V S{Q''+'^{F)) = ^vi = n 

This means that if the graph state reaches a fixed point then that graph state is returned, and if the subset 
of the state becomes empty then the previous state is returned. We motivate this condition of a non-empty 
subset of a produced graph state by our definition of rule application, which requires at least one graph 
from the subset. By rc;turning the last graph state with non-empty subset the repetition strategy can be 
used as a precomputation in a sequence to find a kind of closure under some inner strategy. 

Note that for = the strategy becomes the identity strategy. If k is set large enough to not limit the 
repetition, we call it unbounded repetition, and write it as Q = repeat [Q']. 

4.4 Revive 

The strategy framework is primarily aimed at generating a network representing derivations. However, it 
may be useful to use the strategies directly as functions on graphs which means that repetition strategies 
may not work as expected. 



6 



91 93 




Figure 6: Illustration of the application of repeat[p] to F with S{F) = U{F) = {51,32}- Only the subset 
of the graph states are shown. The first application of p results in two new graphs, 53 and 174, but as p can 
only be applied to 54 the final subset is only a single graph, g^, instead of both 53 and 55. 



Consider the following problem. Two graphs, gi and 172 and the transformation rule p, as illustrated in 
Fig. [5] are given. We wish to develop a strategy to transform all edge labels using rule p, with the intend to 
use this strategy as a precomputation for a subsequent strategy. That is, the result of the strategy must 
contain the completely transformed graphs in the subset. From this specification we first try the strategy 
Q = repeat[p] applied to the graph state F with S{F) = U{F) = {(71,52}, which may be the most intuitive 
approach. However, it does not give the intended result. This is illustrated in Fig.|6] 

The intention of the revive strategy is to provide a mechanism to solve this problem. Such a strategy, Q, 
is defined in terms of an inner strategy, Q' , and is written as Q = revive [Q']. A rule strategy creates a 
(possibly empty) set of derivations, and by extension a strategy creates derivations. We say that a graph g 
is consumed in strategy Q if Q creates a derivation G ^ H with g E G. With this we define the revive 
strategy as: 

F' = revive [Q'](i^) 
U{F') = U{Q\F)) 

S{F') = S{Q'{F)) U {g e S{F) \ g e U{F') A 5 is not consumed in Q'} 

That is, any graph from the input subset which is still in the output universe and was not consumed, will 
be added to the output subset. 

The example problem can with the revive strategy be solved with Q = repeat [revive[(5']]. 



4.5 Derivation Predicates 

For the purpose of precise modeling and the problems with combinatorial explosion it is convenient to limit 
the possibilities of expansion. We define two variations of the concept of derivation predicates, which both 
introduces extra constraints in Eq. ^ to prune unwanted derivations. The strategies, lef tPredicate[P, Q'] 
and rightPredicate[P, Q'], are both defined in terms of a predicate, P, on a transformation rule and a 
multiset of graphs, and an inner strategy, Q' . For a left predicate with P and Q', each derivation G ^ H 
found by Q' must satisfy P{p, G). In a right predicate it must be P(p, H) which is true. 

As example, given a strategy Q' we wish to produce only graphs with at most 42 vertices (atoms, in a 
chemical context). This can be specified with the following strategy: 

Q = rightPredicate[P, Q'] 

P{pA9i,92,-.-9k})^yi<i<k:\V{g,)\<^2 



4.6 Filter, Sort, Take and Add 



To facilitate more elaborate use of strategies in a functional style we define several strategies which correspond 
to functions on lists in other languages. As a graph state is composed of both a universe and a subset, all 
of these strategies are defined in two variations. 

A filter strategy is parameterized by a predicate on a graph and a graph state: 



F' = f ilterSubset[P](P) 

U{F') = U{F) 

S{F') = {geS{F)\P{g,F)} 



F' = f ilterUniverse[P](P) 

U{F') = {g(^U{F)\P{g,F)} 
S{F') = {g^S{F)\P{g,F)} 
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A sorting strategy is parameterized with a predicate on two graphs and a graph state, used as a less-than 
operator in a stable sort of a Hst of graphs: 



F' = sortSubset[P](F) 

U{F') = U{F) 

S{F') = stableSort[P](5(F)) 



F' = sortUniverse[P](F) 
U{F') = stableSort[P](C/(F)) 
S(F') = S{F) 



The choice that the sorting algorithm must be stable is motivated by the desire to allow lexicographical 
sorting by sequencing several sorting strategies. 

A take strategy is parameterized with a natural number: 



F' = takeSubset[7i](F) 
fc = min{n, I S'(i^) I } 
U{F') = U{F) 

S{F') = {S{F)i,S{F)2,...,S{F)k} 



F' = takeUiiiverse[n](F) 

k = mm{n, \U{F)\} 
U{F') = {U{F)i,U{F)2,...,U{F)k} 
S{F') = S{F) n U{F') 



An addition strategy appends a given set of graphs to either the universe and optionally also to the 
subset: 



F' = addSubset [{gi , <?2 , ■ • ■ , 5n }] (i^) 
C/(F') = C/(F)U{gi,g2,...,5j 
S{F') = S{F)^{g^,g2,...,gn} 



F' = addUniverse[{5i,g2, ■ • . ,g„}](F) 
C/(F')-C/(F)U{gi,52,...,5„} 
S{F') = S{F) 



An example usage of these strategies is procedure of ranking graphs according to some property, take 
the best n graphs for subsequence expansion, i.e: 

Q' = sortSubset[P] — )• takeSubset[n] 

The addition strategies can be used both for injecting new graphs in the middle of a strategy, but we also 
find them convenient simply for uniform left-to- right writing of a strategy application. E.g., given a (large) 
strategy Q we wish to apply to the graph state F, we can write: 

F' := addUiiiverse[;7(F)] addSubset [5(F)] Q 

with the interpretation F' = Q{F). 



4.7 Alternate Rule Application 

The definition of rule application described previously is aimed at exploration of the underlying space of 
graphs. In particular the definition of the output subset of rule application, Eq. ([S]), manipulates the graph 
state such that already discovered graphs can not initiate another derivation. To facilitate the use of the 
strategy framework for more direct functional computation it is desired to let the resulting subset of rule 
apphcation be all derived graphs, i.e., change Eq. ^ to S{F') = %. We therefore introduce a strategy to 
change this behaviour for a given inner strategy. That is, the strategy Q = altRuleApp[(5'] evaluates as Q\ 
but with all rule applications in Q' using the alternate subset definition. 



5 Implementation Remarks 

The strategies are implemented in C++ as part of a library, to allow easy extension at the user level. 
Extensions can vary from simple graph state manipulating strategies to complete replacement of the 
underlying transformation formalism. The library is aimed at chemical graph transformation, with special 
optimization for molecules (e.g., use of canonical SMILES strings for graph isomorphism [121 [20]), but 
is not restricted to the domain of chemistry. The current implementation uses VF2[5j to find subgraph 
isomorphisms, and as a fall-back algorithm for isomorphism check for general graphs. Furthermore, the 
library utilizes data structures and procedures for molecule handling form the Graph Grammar Library 
(GGL) ['3;. 

An interpreter for a custom language with graphs, rules and strategies as data types, is also implemented 
to allow easy development of expansion strategies. 
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Figure 7: The starting molecules, (a) isoprene and (b) cyclohexadine, for application of the Diels- Alder 
reaction. The molecules are shown in two versions; one with all vertices explicit and chemical interpretation 
of edge labels (left), and one version in standard chemical visualization. 



6 Results 

The Diels- Alder reaction is one of the most useful reaction in organic chemistry and has heavily influenced 
total synthesis in the last decades [11] . The explosion of the chemical space by applying this reaction several 
times will be biased by the strategy framework. In order to easily illustrate subspaces that are also expected 
to exist in a chemical setting, we will apply the strategy framework to a small puzzle game. 

6.1 The Diels- Alder Reaction 

As a small, but complex, chemistry we here use the Diels- Alder reaction with two starting molecules. The 
reaction is shown in an example derivation in Fig.[l] while the starting molecules, isoprene and cyclohexadine, 
are shown in Fig. [t] Let p = (L K ^ R) he the transformation rule modeling the Diels- Alder reaction. 
The intention of the rule is that it is applied to two molecules, but this constraint is not encoded in the 
rule. We therefore first wrap p with a derivation predicate: 

Qp = leftPredicate[P,p] P{p',G) = #G = 2 

This means that all derivations G ^ H must have |G| — 2. 

A generic breadth-first exploration of the chemical space can be done with the following strategy; 

Qbfs — addSubset [{isoprene, cyclohexadine}] repeat [Qp, 

However, for n — 4 the strategy already discovers 825 new graphs through 1278 derivationsF] The number 
of subgraph isomorphism queries throughout the evaluation is 74591. In Appendix [A] Fig. [TT] the resulting 
derivation graph for just n = 2 is shown. 

We now decide to only look at the subspace of molecules which are derived by repeatedly merging 
molecules with isoprene, starting with cyclohexadine. The following strategy implements this specification: 

Qsubspacc = addUniverse [{isoprene}] addSubset [{cyclohexadine}] (4) 
— > lef tPredicate[Pinit, Qp] — >■ f ilterUniverse[Pfyter] 
— > repeat [Qp, n] 

with 

Pinit{p',G) = G = {isoprene, cyclohexadine} 
-Pfiiter(5: F) = 9¥' cyclohexadine 

This first computes all possible proper derivations {isoprene, cyclohexadine} ^ then removes cyclohexa- 
dine from the graph state to prevent further derivations. In the end it uses breadth-first expansion for 
at most n steps. This strategy, with n = 3 (i.e., 4 expansion steps including the very specific first step) 
discovers only 165 new graphs through 236 derivations|^ and uses 5524 subgraph isomorphism queries. The 
derivation graph with n = 2 is visualized in Fig. [8] 

^In this scenario we regard derivations which only differ in the matching morphism as duplicates. The evaluation of the 
strategy takes in the order of 10 seconds with a Intel® Core^M i5_2500K CPU (3.30GHz). 
^ — " — , resp., 8 seconds 
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Figure 8: The derivation graph resulting from evaluating the expansion strategy Qsubspacc, Eq. 0. To 
minimize clutter, the vertex with isoprene and the corresponding edges are not shown, although isoprene is 
involved in any reaction (the resulting chemical reaction network is a hypergraph). 
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Figure 9: Level 1 of the Catalan game and the intermediary graphs during transformation to a graph with 
a single vertex. 



6.2 Solving the Catalan Game 

The Catalan game [T2| is a puzzle game in which the player in each level is presented with a simple 
undirected graph without labels. The goal is to transform the graph into a single vertex using the following 
rewriting rule; given a vertex v with degree exactly 3, identify v with its neighbours and preserve simpleness 
of the graph by identifying parallel edges and deleting loops. Fig. [9] shows level 1 with the intermediary 
graphs towards the goal graph with a single vertex. 

The transformation in the game can not be formulated as a single rule in the DPO formalism, because 
such rules must explicitly match the vertices and edges which are changed, while the Catalan transformation 
needs to change arbitrarily many edges. In the following we show how the strategies can be used to 
implement a move in the game, using only DPO rules. 

Let g be the graph from some Catalan level, with all edge labels set to the empty string and all vertex 
labels set to the arbitrarily chosen label "0" . A high-level description of a move is: 
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1. Find a vertex v with at least 3 neighbours and mark it by changing the label to "A". Mark the 3 
matched neighbours with the label "R" . 

2. If possible, find another fourth neighbour of v and mark v with "FAIL" . 

3. Discard all graphs with a vertex with the label "FAIL" . 

4. For all edges e with both end- vertices having label "R" , remove e. 

5. For all edges ur with u having label "0" and r having label "R" , add uv if it does not exist already 
and then remove ur. 

6. For all edges ur with u having label "0" and r having label "R" , remove ur. 

7. Remove all neighbours of v having label "R" . 

8. Unmark v by changing the label to "0" . 

Step 3 can be implemented with a filtering strategy while the other steps each require a transformation rule. 
The following strategy can be used to solve a level, in the sense that if a graph with a single vertex with 
label "0" is found, then a path to that graph is equivalent to a solution. The details of the transformation 
rules (mark, markForFail, removelnterR, reattachExternal, remove Attached, removeR and unmark) are 
shown in Appendix [B] 

Qcataian = addSubset [{ 5}] — > altRuleApp [repeat [ 

mark — > revive [markForFail] — > f iIterUniverse[Pfaii] 
— >■ repeat [revive [removelnterR]] 
— >■ repeat [revive [reattachExternal]] 
— )■ repeat [revive [removeAttached]] 
-> rcmovcR — >■ unmark 

]] 

P{g',F) = no vertex of 5' has the label "FAIL" 

With strategy Qcataian all 56 levels of Catalan could be solved, all but one level took less than 10 minutes 
of computation time. Fig. |10b| exemplarily shows the derivation graph created when executing the strategy 



with g encoding level 25 of the game, and Fig. 10a show the initial level graph. The resulting derivation 
graph is, in contrast to chemical reaction networks, not a hypergraph. However, the graph clearly illustrates 
subspaces that are connected via a small number of bridging edges. Such subspaces are also expected in 
chemical reaction networks. 



7 Conclusions 

We introduced here generic strategies for the systematic exploration of spaces of graphs. Our generative 
approaches use the Double Pushout formalism in order to derive new graphs. Since this task is of practical 
relevance in chemistry [7] , we designed our framework and implementation with the aim of high efRciency 
in this particular domain of application. It is in no way restricted to this area, however. As an example we 
showed that our implementation of graph strategies can be used to treat high-level transformation rules on 
graphs that cannot be formulated as graph grammars with a finite number of rules since, e.g., the size of 
subgraph that is affected by the transformation is not bounded by the rule but only by the input graphs. 

Performance was a particular focus of our work, although this has not been discussed in detail here. We 
use state-of-the-art subgraph isomorphism check methods and we heavily employ hashing techniques in 
order to check for graph isomorphisms; in order to infer proper derivations of new molecules with full or 
partial rule application we do not use a straightforward method to enumerate all possible left-hand-sides 
of derivations. Instead we employ partial rule applications, a method that shows theoretically as well as 
empirically a much better performance (a detailed discussion will be published elsewhere). 

In order to analyze chemical reaction networks as created by our strategy framework, there exist 
several mathematical techniques that we plan to apply. Two of the most prominent ones are Flux Balance 
Analysis jH] and Elementary Mode Analysis P^. Note, that these methods are usually not applied to 
dynamically created reaction networks as produced by our framework. We aim at detecting new well-defined 
chemical reaction pattern. Furthermore, we expect to identify highly connected subgraphs in chemical 
spaces, that are connected via a small number of bridging reaction, similar to our observation for the 
Catalan game. 
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(a) (b) 

Figure 10: The derivation graph created during expansion of level 25 of the Catalan game. A path equivalent 
to a solution is highlighted. 
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A Additional Diels-Alder Chemistry Figure 



Fig. [TT] shows the derivation graph obtained from the breadth-first expansion of the Diels-Alder chemistry. 
The number of expansion steps is only 2. 




Figure 11: The derivation graph resulting from evaluating the breadth- first expansion strategy Qbfs = 
addSubset[{isoprene, cyclohexadine}] — > repeat [Qp, 2] (on an empty graph state). To minimize clutter, 
the vertex with isoprene and the corresponding edges are not shown. 
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B Transformation Rules for the Catalan Game 



The following sections contain visualization of the rules used in the strategy to solve a level in the Catalan 
game. Vertices and edges shown in red are those being changed during transformation. For some vertices 
the change is only a change of label. The label in the context graph, A', is for those in the format "L I R" 
with L and R being the label in the left and right side of the rule. 
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